Robust Voice Activity Detector for Real World Applications Using Harmonicity and Modulation Frequency
نویسندگان
چکیده
The task of robustly detecting distant speech in low SNR environments for automatic speech recognition is examined using a two-stage approach based on two distinguishing features of speech, namely harmonicity and modulation frequency (MF). A modified metric for harmonicity is used as a gating function to a set of parallel classifiers that incorporate MFs computed on different frequency bands. Performance is evaluated on both the frame-level discriminative power and also the system level ASR results on a real-world robotic forklift task. Compared to other previously proposed features such as relative spectral entropy, and classification strategies involving MFs, the combined approach shows good generalization across different kinds of dynamic noise conditions, and obtains a significant improvement on the false alarm rate at low speech miss rate settings. The overall ASR results also improved significantly compared to the ESTI AMR-VAD2, while reducing the number of false alarms by a factor of two.
منابع مشابه
Incremental acoustic subspace learning for voice activity detection using harmonicity-based features
This paper presents novel voice activity detection (VAD) approach based on incremental subspace learning using harmonicity-based features. Harmonic structure is well known as noise robust speech feature. We develop novel harmonicitybased feature based on temporal-spectral co-occurrence patterns. At statistical decision stage, many conventional statistical VAD methods rely on Gaussian model; how...
متن کاملA low-complexity voice activity detector for smart hearing protection of hyperacusic persons
In this paper, a Voice Activity Detector (VAD) is proposed for smart hearing protection applications where speech is to get through the hearing protector while ambient noise is to be blocked out. The VAD calculates a short-term statistical assessment of the temporal envelopes within different frequency bands. This assessment uses the Inter-Quartile Range (IQR) and reflects the dispersion of the...
متن کاملNew harmonicity measures for pitch estimation and voice activity detection
Harmonic structure can be easily recognized in the timefrequency representation of speech signals even in the diverse environment. The harmonicity is a measure of the completeness of harmonic structure. This paper extends the use of conventional harmonicity measure to the tasks of pitch estimation and voice activity detection. A set of hierarchical harmonicities, including grid, temporal, spect...
متن کاملSpeech event detection using multiband modulation energy
The need for efficient, sophisticated features for speech event detection is inherent in state of the art processing, enhancement and recognition systems. We explore ideas and techniques from non-linear speech modeling and analysis, like modulations and multiband filtering and propose new energy and spectral content features derived through filtering in multiple frequency bands and tracking dom...
متن کاملA New Algorithm for Voice Activity Detection Based on Wavelet Packets (RESEARCH NOTE)
Speech constitutes much of the communicated information; most other perceived audio signals do not carry nearly as much information. Indeed, much of the non-speech signals maybe classified as ‘noise’ in human communication. The process of separating conversational speech and noise is termed voice activity detection (VAD). This paper describes a new approach to VAD which is based on the Wavelet ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011